Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netdev CI testing #6666

Open
wants to merge 79 commits into
base: bpf-next_base
Choose a base branch
from
Open

Conversation

kuba-moo
Copy link
Contributor

Reusable PR for hooking netdev CI to BPF testing.

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 4f22ee0 to 8a9a8e0 Compare March 28, 2024 04:46
@kuba-moo kuba-moo force-pushed the to-test branch 11 times, most recently from 64c403f to 8da1f58 Compare March 29, 2024 00:01
@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 78ebb17 to 9325308 Compare March 29, 2024 02:14
@kuba-moo kuba-moo force-pushed the to-test branch 6 times, most recently from c8c7b2f to a71aae6 Compare March 29, 2024 18:01
@kuba-moo kuba-moo force-pushed the to-test branch 2 times, most recently from d8feb00 to b16a6b9 Compare March 30, 2024 00:01
@kuba-moo kuba-moo force-pushed the to-test branch 2 times, most recently from 4164329 to c5cecb3 Compare March 30, 2024 06:00
dark and others added 29 commits January 26, 2025 22:00
When a workqueue is created with `WQ_UNBOUND`, its work items are
served by special worker-pools, whose host workers are not bound to
any specific CPU. In the default configuration (i.e. when
`queue_delayed_work` and friends do not specify which CPU to run the
work item on), `WQ_UNBOUND` allows the work item to be executed on any
CPU in the same node of the CPU it was enqueued on. While this
solution potentially sacrifices locality, it avoids contention with
other processes that might dominate the CPU time of the processor the
work item was scheduled on.

This is not just a theoretical problem: in a particular scenario
misconfigured process was hogging most of the time from CPU0, leaving
less than 0.5% of its CPU time to the kworker. The IDPF workqueues
that were using the kworker on CPU0 suffered large completion delays
as a result, causing performance degradation, timeouts and eventual
system crash.

Tested:

* I have also run a manual test to gauge the performance
  improvement. The test consists of an antagonist process
  (`./stress --cpu 2`) consuming as much of CPU 0 as possible. This
  process is run under `taskset 01` to bind it to CPU0, and its
  priority is changed with `chrt -pQ 9900 10000 ${pid}` and
  `renice -n -20 ${pid}` after start.

  Then, the IDPF driver is forced to prefer CPU0 by editing all calls
  to `queue_delayed_work`, `mod_delayed_work`, etc... to use CPU 0.

  Finally, `ktraces` for the workqueue events are collected.

  Without the current patch, the antagonist process can force
  arbitrary delays between `workqueue_queue_work` and
  `workqueue_execute_start`, that in my tests were as high as
  `30ms`. With the current patch applied, the workqueue can be
  migrated to another unloaded CPU in the same node, and, keeping
  everything else equal, the maximum delay I could see was `6us`.

Fixes: 0fe4546 ("idpf: add create vport and netdev configuration")
Signed-off-by: Marco Leogrande <[email protected]>
Signed-off-by: Manoj Vishwanathan <[email protected]>
Signed-off-by: Brian Vazquez <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Pavan Kumar Linga <[email protected]>
Tested-by: Krishneil Singh <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Add more information related to the transaction like cookie, vc_op,
salt when transaction times out and include similar information
when transaction salt does not match.

Info output for transaction timeout:
-------------------
(op:5015 cookie:45fe vc_op:5015 salt:45 timeout:60000ms)
-------------------

before it was:

-------------------
(op 5015, 60000ms)
-------------------

Signed-off-by: Manoj Vishwanathan <[email protected]>
Signed-off-by: Brian Vazquez <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Reviewed-by: Pavan Kumar Linga <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Krishneil Singh <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Fix &ice_parser_rt::bst_key size. It was wrongly set to 10 instead of 20
in the initial impl commit (see Fixes tag). All usage code assumed it was
of size 20. That was also the initial size present up to v2 of the intro
series [2], but halved by v3 [3] refactor described as "Replace magic
hardcoded values with macros." The introducing series was so big that
some ugliness was unnoticed, same for bugs :/

ICE_BST_KEY_TCAM_SIZE and ICE_BST_TCAM_KEY_SIZE were differing by one.
There was tmp variable @j in the scope of edited function, but was not
used in all places. This ugliness is now gone.
I'm moving ice_parser_rt::pg_prio a few positions up, to fill up one of
the holes in order to compensate for the added 10 bytes to the ::bst_key,
resulting in the same size of the whole as prior to the fix, and minimal
changes in the offsets of the fields.

Extend also the debug dump print of the key to cover all bytes. To not
have string with 20 "%02x" and 20 params, switch to
ice_debug_array_w_prefix().

This fix obsoletes Ahmed's attempt at [1].

[1] https://lore.kernel.org/intel-wired-lan/[email protected]
[2] https://lore.kernel.org/intel-wired-lan/[email protected]
[3] https://lore.kernel.org/intel-wired-lan/[email protected]

Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/intel-wired-lan/[email protected]
Fixes: 9a4c07a ("ice: add parser execution main loop")
CC: Ahmed Zaki <[email protected]>
Reviewed-by: Larysa Zaremba <[email protected]>
Signed-off-by: Przemek Kitszel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
It occurred that in the commit 7083893 ("ice: Implement driver
functionality to dump serdes equalizer values") the invalid DRATE parameter
for reading has been added. The output of the command:

  $ ethtool -d <ethX>

returns the garbage value in the place where DRATE value should be
stored.

Remove mentioned parameter to prevent return of corrupted data to
userspace.

Fixes: 7083893 ("ice: Implement driver functionality to dump serdes equalizer values")
Signed-off-by: Mateusz Polchlopek <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Tested-by: Rinitha S <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
First case:
> ip l a l $VF name vlanx type vlan id 100
> ip l d vlanx
> ip l a l $VF name vlanx type vlan id 100

As workqueue can be execute after sometime, there is a window to have
call trace like that:
- iavf_del_vlan
- iavf_add_vlan
- iavf_del_vlans (wq)

It means that our VLAN 100 will change the state from IAVF_VLAN_ACTIVE
to IAVF_VLAN_REMOVE (iavf_del_vlan). After that in iavf_add_vlan state
won't be changed because VLAN 100 is on the filter list. The final
result is that the VLAN 100 filter isn't added in hardware (no
iavf_add_vlans call).

To fix that change the state if the filter wasn't removed yet directly
to active. It is save as IAVF_VLAN_REMOVE means that virtchnl message
wasn't sent yet.

Second case:
> ip l a l $VF name vlanx type vlan id 100
Any type of VF reset ex. change trust
> ip l s $PF vf $VF_NUM trust on
> ip l d vlanx
> ip l a l $VF name vlanx type vlan id 100

In case of reset iavf driver is responsible for readding all filters
that are being used. To do that all VLAN filters state are changed to
IAVF_VLAN_ADD. Here is even longer window for changing VLAN state from
kernel side, as workqueue isn't called immediately. We can have call
trace like that:

- changing to IAVF_VLAN_ADD (after reset)
- iavf_del_vlan (called from kernel ops)
- iavf_del_vlans (wq)

Not exsisitng VLAN filters will be removed from hardware. It isn't a
bug, ice driver will handle it fine. However, we can have call trace
like that:

- changing to IAVF_VLAN_ADD (after reset)
- iavf_del_vlan (called from kernel ops)
- iavf_add_vlan (called from kernel ops)
- iavf_del_vlans (wq)

With fix for previous case we end up with no VLAN filters in hardware.
We have to remove VLAN filters if the state is IAVF_VLAN_ADD and delete
VLAN was called. It is save as IAVF_VLAN_ADD means that virtchnl message
wasn't sent yet.

Fixes: 0c0da0e ("iavf: refactor VLAN filter states")
Signed-off-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Pointer arguments passed to ioctls need to pass through compat_ptr() to
work correctly on s390; as explained in Documentation/driver-api/ioctl.rst.
Detect compat mode at runtime and call compat_ptr() for those commands
which do take pointer arguments.

Suggested-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Fixes: d94ba80 ("ptp: Added a brand new class driver for ptp clocks.")
Signed-off-by: Thomas Weißschuh <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Originally, it was possible for the DPE length check to overflow if
wDatagramIndex + wDatagramLength > U16_MAX. This could lead to an OoB
read.

Move the wDatagramIndex term to the other side of the inequality.

An existing condition ensures that wDatagramIndex < urb->actual_length.

Fixes: a2d274c ("usbnet: ipheth: add CDC NCM support")
Cc: [email protected]
Signed-off-by: Foster Snowhill <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
By definition, a DPE points at the start of a network frame/datagram.
Thus it makes no sense for it to point at anything that's part of the
NCM header. It is not a security issue, but merely an indication of
a malformed DPE.

Enforce that all DPEs point at the data portion of the URB, past the
NCM header.

Fixes: a2d274c ("usbnet: ipheth: add CDC NCM support")
Cc: [email protected]
Signed-off-by: Foster Snowhill <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Original code allowed for the start of NDP16 to be anywhere within the
URB based on the `wNdpIndex` value in NTH16. Only the start position of
NDP16 was checked, so it was possible for even the fixed-length part
of NDP16 to extend past the end of URB, leading to an out-of-bounds
read.

On iOS devices, the NDP16 header always directly follows NTH16. Rely on
and check for this specific format.

This, along with NCM-specific minimal URB length check that already
exists, will ensure that the fixed-length part of NDP16 plus a set
amount of DPEs fit within the URB.

Note that this commit alone does not fully address the OoB read.
The limit on the amount of DPEs needs to be enforced separately.

Fixes: a2d274c ("usbnet: ipheth: add CDC NCM support")
Cc: [email protected]
Signed-off-by: Foster Snowhill <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Introduce an rx_error label to reduce repetitions in the header
signature checks.

Store wDatagramIndex and wDatagramLength after endianness conversion to
avoid repeated le16_to_cpu() calls.

Rewrite the loop to return on a null trailing DPE, which is required
by the CDC NCM spec. In case it is missing, fall through to rx_error.

This change does not fix any particular issue. Its purpose is to
simplify a subsequent commit that fixes a potential OoB read by limiting
the maximum amount of processed DPEs.

Cc: [email protected] # 6.5.x
Signed-off-by: Foster Snowhill <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Originally, the total NCM header size was computed as the sum of two
vaguely labelled constants. While accurate, it wasn't particularly clear
where they were coming from.

Use sizes of existing NCM structs where available. Define the total
NDP16 size based on the maximum amount of DPEs that can fit into the
iOS-specific fixed-size header.

This change does not fix any particular issue. Rather, it introduces
intermediate constants that will simplify subsequent commits.
It should also make it clearer for the reader where the constant values
come from.

Cc: [email protected] # 6.5.x
Signed-off-by: Foster Snowhill <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Fix an out-of-bounds DPE read, limit the number of processed DPEs to
the amount that fits into the fixed-size NDP16 header.

Fixes: a2d274c ("usbnet: ipheth: add CDC NCM support")
Cc: [email protected]
Signed-off-by: Foster Snowhill <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Clarify that the "NCM" implementation in `ipheth` is very limited, as
iOS devices aren't compatible with the CDC NCM specification in regular
tethering mode.

For a standards-compliant implementation, one shall turn to
the `cdc_ncm` module.

Cc: [email protected] # 6.5.x
Signed-off-by: Foster Snowhill <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Expected behaviour:
In case we reach scheduler's limit, pfifo_tail_enqueue() will drop a
packet in scheduler's queue and decrease scheduler's qlen by one.
Then, pfifo_tail_enqueue() enqueue new packet and increase
scheduler's qlen by one. Finally, pfifo_tail_enqueue() return
`NET_XMIT_CN` status code.

Weird behaviour:
In case we set `sch->limit == 0` and trigger pfifo_tail_enqueue() on a
scheduler that has no packet, the 'drop a packet' step will do nothing.
This means the scheduler's qlen still has value equal 0.
Then, we continue to enqueue new packet and increase scheduler's qlen by
one. In summary, we can leverage pfifo_tail_enqueue() to increase qlen by
one and return `NET_XMIT_CN` status code.

The problem is:
Let's say we have two qdiscs: Qdisc_A and Qdisc_B.
 - Qdisc_A's type must have '->graft()' function to create parent/child relationship.
   Let's say Qdisc_A's type is `hfsc`. Enqueue packet to this qdisc will trigger `hfsc_enqueue`.
 - Qdisc_B's type is pfifo_head_drop. Enqueue packet to this qdisc will trigger `pfifo_tail_enqueue`.
 - Qdisc_B is configured to have `sch->limit == 0`.
 - Qdisc_A is configured to route the enqueued's packet to Qdisc_B.

Enqueue packet through Qdisc_A will lead to:
 - hfsc_enqueue(Qdisc_A) -> pfifo_tail_enqueue(Qdisc_B)
 - Qdisc_B->q.qlen += 1
 - pfifo_tail_enqueue() return `NET_XMIT_CN`
 - hfsc_enqueue() check for `NET_XMIT_SUCCESS` and see `NET_XMIT_CN` => hfsc_enqueue() don't increase qlen of Qdisc_A.

The whole process lead to a situation where Qdisc_A->q.qlen == 0 and Qdisc_B->q.qlen == 1.
Replace 'hfsc' with other type (for example: 'drr') still lead to the same problem.
This violate the design where parent's qlen should equal to the sum of its childrens'qlen.

Bug impact: This issue can be used for user->kernel privilege escalation when it is reachable.

Reported-by: Quang Le <[email protected]>
Signed-off-by: Quang Le <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
…limit==0

When limit == 0, pfifo_tail_enqueue() must drop new packet and
increase dropped packets count of the qdisc.

All test results:

1..16
ok 1 a519 - Add bfifo qdisc with system default parameters on egress
ok 2 585c - Add pfifo qdisc with system default parameters on egress
ok 3 a86e - Add bfifo qdisc with system default parameters on egress with handle of maximum value
ok 4 9ac8 - Add bfifo qdisc on egress with queue size of 3000 bytes
ok 5 f4e6 - Add pfifo qdisc on egress with queue size of 3000 packets
ok 6 b1b1 - Add bfifo qdisc with system default parameters on egress with invalid handle exceeding maximum value
ok 7 8d5e - Add bfifo qdisc on egress with unsupported argument
ok 8 7787 - Add pfifo qdisc on egress with unsupported argument
ok 9 c4b6 - Replace bfifo qdisc on egress with new queue size
ok 10 3df6 - Replace pfifo qdisc on egress with new queue size
ok 11 7a67 - Add bfifo qdisc on egress with queue size in invalid format
ok 12 1298 - Add duplicate bfifo qdisc on egress
ok 13 45a0 - Delete nonexistent bfifo qdisc
ok 14 972b - Add prio qdisc on egress with invalid format for handles
ok 15 4d39 - Delete bfifo qdisc twice
ok 16 d774 - Check pfifo_head_drop qdisc enqueue behaviour when limit == 0

Signed-off-by: Quang Le <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
qdisc_tree_reduce_backlog() notifies parent qdisc only if child
qdisc becomes empty, therefore we need to reduce the backlog of the
child qdisc before calling it. Otherwise it would become a nop and
result in UAF in DRR case (which is integrated in the following patch).

Fixes: f8d4bc4 ("net/sched: netem: account for backlog updates from child qdisc")
Cc: Martin Ottens <[email protected]>
Reported-by: Mingi Cho <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Integrate the test case provided by Mingi Cho into TDC.

All test results:

1..4
ok 1 ca5e - Check class delete notification for ffff:
ok 2 e4b7 - Check class delete notification for root ffff:
ok 3 33a9 - Check ingress is not searchable on backlog update
ok 4 a4b9 - Test class qlen notification

Cc: Mingi Cho <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Max Schulze <[email protected]>
Tested-by: Max Schulze <[email protected]>
Suggested-by: David Hollis <[email protected]>
Reported-by: Sven Kreiensen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Replace vmalloc allocations with kvmalloc since
kvmalloc is more flexible in memory allocation

Signed-off-by: Denis Kirjanov <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
The sanity check that both source and destination are set when symmetric
RSS hash is requested is only relevant for ETHTOOL_SRXFH (rx-flow-hash),
it should not be performed on any other commands (e.g.
ETHTOOL_SRXCLSRLINS/ETHTOOL_SRXCLSRLDEL).

This resolves accessing uninitialized 'info.data' field, and fixes false
errors in rule insertion:
  # ethtool --config-ntuple eth2 flow-type ip4 dst-ip 255.255.255.255 action -1 loc 0
  rmgr: Cannot insert RX class rule: Invalid argument
  Cannot insert classification rule

Fixes: 13e5934 ("net: ethtool: add support for symmetric-xor RSS hash")
Cc: Ahmed Zaki <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Gal Pressman <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
The number of MTL queues to use is specified by the parameter
"snps,{tx,rx}-queues-to-use" from stmmac_platform layer.

However, the maximum numbers of queues are constrained by upper limits
determined by the capability of each hardware feature. It's appropriate
to limit the values not to exceed the upper limit values and display
a warning message.

This only works if the hardware capability has the upper limit values.

Fixes: d976a52 ("net: stmmac: multiple queues dt configuration")
Signed-off-by: Kunihiko Hayashi <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Tx/Rx FIFO size is specified by the parameter "{tx,rx}-fifo-depth" from
stmmac_platform layer.

However, these values are constrained by upper limits determined by the
capabilities of each hardware feature. There is a risk that the upper
bits will be truncated due to the calculation, so it's appropriate to
limit them to the upper limit values and display a warning message.

This only works if the hardware capability has the upper limit values.

Fixes: e7877f5 ("stmmac: Read tx-fifo-depth and rx-fifo-depth from the devicetree")
Signed-off-by: Kunihiko Hayashi <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
…pecified

When Tx/Rx FIFO size is not specified in advance, the driver checks if
the value is zero and sets the hardware capability value in functions
where that value is used.

Consolidate the check and settings into function stmmac_hw_init() and
remove redundant other statements.

If FIFO size is zero and the hardware capability also doesn't have upper
limit values, return with an error message.

Signed-off-by: Kunihiko Hayashi <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
tc_actions.sh keeps hanging the forwarding tests.

sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th

Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
This reverts commit 76d5d4c.

Signed-off-by: NipaLocal <nipa@local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.